14 research outputs found

    Convolutional Neural Network Architectures for Gender, Emotional Detection from Speech and Speaker Diarization

    Get PDF
    This paper introduces three system architectures for speaker identification that aim to overcome the limitations of diarization and voice-based biometric systems. Diarization systems utilize unsupervised algorithms to segment audio data based on the time boundaries of utterances, but they do not distinguish individual speakers. On the other hand, voice-based biometric systems can only identify individuals in recordings with a single speaker. Identifying speakers in recordings of natural conversations can be challenging, especially when emotional shifts can alter voice characteristics, making gender identification difficult. To address this issue, the proposed architectures include techniques for gender, emotion, and diarization at either the segment or group level. The evaluation of these architectures utilized two speech databases, namely VoxCeleb and RAVDESS (Ryerson audio-visual database of emotional speech and song) datasets. The findings reveal that the proposed approach outperforms the strategy level in terms of recognition results, despite the real-time processing advantage of the latter. The challenge of identifying multiple speakers engaging in a conversation while considering emotional changes that impact speech is effectively addressed by the proposed architectures. The data indicates that the gender and emotion classification of diarization achieves an accuracy of over 98 percent. These results suggest that the proposed speech-based approach can achieve highly accurate speaker identification

    Une nouvelle méthodologie prédictive fondée sur un modèle séquence à séquence utilisé pour la transformation de la parole œsophagienne en voix laryngée

    Get PDF
    La situation sanitaire ne permettant pas d’organiser les 9èmes Journées de Phonétique Clinique dans les meilleures conditions (à savoir en présentiel), le comité de programme a décidé d’annuler cette édition 2021 et d’organiser à la place une journée dédiée à la présentation des contributions acceptées le 27 mai 2021.National audienc

    Non-Parallel Voice Conversion System Using An Auto-Regressive Model

    No full text
    International audienceMuch existing voice conversion (VC) systems are attractive owing to their high performance in terms of voice quality and speaker similarity. Nevertheless, without parallel training data, some generated waveform trajectories are not yet smooth, leading to degraded sound quality and mispronunciation issues in the converted speech. To address these shortcomings, this paper proposes a non-parallel VC system based on an auto-regressive model, Phonetic PosteriorGrams (PPGs), and an LPCnet vocoder to generate high-quality converted speech. The proposed auto-regressive structure makes our system able to produce the next step outputs from the previous step acoustic features. Further, the use of PPGs aims to convert any unknown source speaker into a specific target speaker due to their speaker-independent properties. We evaluate the effectiveness of our system by performing any-to-one conversion pairs between native English speakers. Objective and subjective measures show that our method outperforms the best non-parallel VC method of Voice Conversion Challenge 2018 in terms of naturalness and speaker similarity

    Deep Learning-Based Patch-Wise Illumination Estimation for Enhanced Multi-Exposure Fusion

    No full text
    This article suggests a unique technique for multi-exposure fusion using convolutional neural networks (CNNs) for patch-wise illumination estimates. Multi-exposure fusion is a crucial component of enhancing image quality, particularly in circumstances with erratic lighting. Our proposed approach makes use of CNNs’ capability to anticipate light levels inside specific image patches in order to accurately change exposure levels. We look at the theoretical foundations of our approach, emphasising the advantages of patch-wise estimation in capturing intricate lighting details. Additionally, we present experimental results demonstrating enhanced dynamic range expansion and image detail preservation, demonstrating that our methodology is more effective than conventional fusion methods. This study advances the state-of-the-art in multi-exposure fusion while also opening up new prospects for computational photography, surveillance, and computer vision applications

    Оптимізація Харріса Хокса для маршруту автомобілів швидкої допомоги в розумних містах

    No full text
    The ambulance routing problem is one of the capacitated ambulance routing problem forms. It deals with injuries and their requests for saving. Therefore, the main aim of the ambulance routing problem is to determine the minimum (i.e., optimum) required distances of between: 1) accident places and the ambulance station; 2) the location of the nearest hospital and the accident places. Although of the efforts proposed in the literature, determining the optimum route is crucial. Therefore, this article seeks to tackle ambulance vehicle routing in smart cities using Harris Hawks Optimization (HHO) algorithm. It attempts to take the victims as quickly as possible and confidently. Several engineering optimization problems confirm that HHO outperforms many well-known Swarm intelligence approaches. In our system, let’s use the node approach to produce a city map. Initially, the control station receives accident site information and sends it to the hospital and the ambulance. The HHO vehicle routing algorithm receives data from the driver; the data includes the location of the accident and the node position of the ambulance vehicle. Then, the driver’s shortest route to the accident scene by the HHO. The locations of the accident and hospital are updated by the driver once the car reaches the accident site. The fastest route (which results in the least travel time) to the hospital is then determined. The HHO can provide offline information for a potential combination of the coordinates of destination and source. Extensive simulation experiments demonstrated that the HHO can provide optimal solutions. Furthermore, performance evaluation experiments demonstrated the superiority of the HHO algorithm over its counterparts (SAODV, TVR, and TBM methods). Furthermore, for ten malicious nodes, the PDF of the algorithm was 0.91, which is higher than the counterpartsПроблема маршрутизації швидкої допомоги є однією з форм задачі маршрутизації швидкої допомоги. Основна мета задачі маршрутизації автомобіля швидкої допомоги полягає у визначенні мінімальних (тобто оптимальних) необхідних відстаней між: 1) місцями нещасних випадків і станцією швидкої медичної допомоги; 2) розташуванням найближчої лікарні та місця нещасних випадків. Серед запропонованих у літературі рішень визначення оптимального маршруту є вирішальним. Тому це дослідження мало за мету розглянути маршрути автомобіля швидкої допомоги в розумних містах за допомогою алгоритму оптимізації Харріса Хокса (ОХХ). Він  дозволяє максимально швидко і впевнено уникати жертв. Кілька проблем інженерної оптимізації підтверджують, що ННО перевершує багато добре відомих підходів ройового інтелекту. В розглянутій системі було використано вузловий підхід для створення карти міста. Спочатку диспетчерська станція отримує інформацію про місце аварії та передає її до лікарні та швидкої допомоги. Алгоритм маршрутизації автомобіля ОХХ отримує дані від водія; дані включають місце аварії та вузлове положення автомобіля швидкої допомоги. Потім найкоротший шлях водія до місця ДТП через ОХХ. Місце аварії та лікарні оновлює водій, коли автомобіль доїжджає до місця аварії. Після цього визначається найшвидший маршрут (що забезпечує найменший час у дорозі) до лікарні. ОХХ може надавати офлайн-інформацію про потенційну комбінацію координат пункту призначення та джерела. Масштабні експерименти з моделювання показали, що ОХХ може забезпечити оптимальні рішення. Крім того, експерименти з оцінки продуктивності продемонстрували перевагу алгоритму ОХХ над його аналогами (методи SAODV, TVR і TBM). Крім того, для десяти шкідливих вузлів PDF алгоритму становив 0,91, що вище, ніж у аналогі

    Any-to-One Non-Parallel Voice Conversion System Using an Autoregressive Conversion Model and LPCNet Vocoder

    No full text
    We present an any-to-one voice conversion (VC) system, using an autoregressive model and LPCNet vocoder, aimed to enhance the converted speech in terms of naturalness, intelligibility, and speaker similarity. As the name implies, non-parallel any-to-one voice conversion does not require paired source and target speeches and can be employed for arbitrary speech conversion tasks. Recent advancements in neural-based vocoders, such as WaveNet, have improved the efficiency of speech synthesis. However, in practice, we find that the trajectory of some generated waveforms is not consistently smooth, leading to occasional voice errors. To address this issue, we propose to use an autoregressive (AR) conversion model along with the high-fidelity LPCNet vocoder. This combination not only solves the problems of waveform fluidity but also produces more natural and clear speech, with the added capability of real-time speech generation. To precisely represent the linguistic content of a given utterance, we use speaker-independent PPG features (SI-PPG) computed from an automatic speech recognition (ASR) model trained on a multi-speaker corpus. Next, a conversion model maps the SI-PPG to the acoustic representations used as input features for the LPCNet. The proposed autoregressive structure enables our system to produce the following prediction step outputs from the acoustic features predicted in the previous step. We evaluate the effectiveness of our system by performing any-to-one conversion pairs between native English speakers. Experimental results show that the proposed method outperforms state-of-the-art systems, producing higher speech quality and greater speaker similarity

    Intelligibility Improvement of Esophageal Speech Using Sequence-to-Sequence Voice Conversion with Auditory Attention

    No full text
    Laryngectomees are individuals whose larynx has been surgically removed, usually due to laryngeal cancer. The immediate consequence of this operation is that these individuals (laryngectomees) are unable to speak. Esophageal speech (ES) remains the preferred alternative speaking method for laryngectomees. However, compared to the laryngeal voice, ES is characterized by low intelligibility and poor quality due to chaotic fundamental frequency F0, specific noises, and low intensity. Our proposal to solve these problems is to take advantage of voice conversion as an effective way to improve speech quality and intelligibility. To this end, we propose in this work a novel esophageal–laryngeal voice conversion (VC) system based on a sequence-to-sequence (Seq2Seq) model combined with an auditory attention mechanism. The originality of the proposed framework is that it adopts an auditory attention technique in our model, which leads to more efficient and adaptive feature mapping. In addition, our VC system does not require the classical DTW alignment process during the learning phase, which avoids erroneous mappings and significantly reduces the computational time. Moreover, to preserve the identity of the target speaker, the excitation and phase coefficients are estimated by querying a binary search tree. In experiments, objective and subjective tests confirmed that the proposed approach performs better even in some difficult cases in terms of speech quality and intelligibility

    Optimum Feature Selection with Particle Swarm Optimization to Face Recognition System Using Gabor Wavelet Transform and Deep Learning

    No full text
    In this study, Gabor wavelet transform on the strength of deep learning which is a new approach for the symmetry face database is presented. A proposed face recognition system was developed to be used for different purposes. We used Gabor wavelet transform for feature extraction of symmetry face training data, and then, we used the deep learning method for recognition. We implemented and evaluated the proposed method on ORL and YALE databases with MATLAB 2020a. Moreover, the same experiments were conducted applying particle swarm optimization (PSO) for the feature selection approach. The implementation of Gabor wavelet feature extraction with a high number of training image samples has proved to be more effective than other methods in our study. The recognition rate when implementing the PSO methods on the ORL database is 85.42% while it is 92% with the three methods on the YALE database. However, the use of the PSO algorithm has increased the accuracy rate to 96.22% for the ORL database and 94.66% for the YALE database

    Deep learning methods for early detection of Alzheimer’s disease using structural MR images: A survey

    No full text
    In this paper, we present an extensive review of the most recent works for Alzheimer’s disease (AD) prediction, particularly Moderate Cognitive Impairment (MCI) conversion prediction. We aimed to identify the most useful brain-magnetic resonance imaging (MRI) biomarkers as well as the most successful deep learning frameworks used for prediction. To achieve this, we analysed more than 130 works and 7 review articles. A closer look revealed that the hippocampus is an important region of interest (ROI) is affected early by AD and that many related features help detect the disease in its early stages. However, considered alone, this ROI is not sufficient enough to ensure high prediction performance. Therefore, many other brain regions can also provide additional information to improve prediction accuracy. In relation to state-of-the-art deep neural networks, the U-Net represents the most efficient architecture for hippocampus segmentation. The best Dice Similarity Coefficient (DSC) value, equal to 94%, was achieved by the RESU-Net architecture. The best results for MCI conversion prediction were obtained for two models that identify significant landmarks from the whole brain for classification. The multi-stream convolutional neural network achieved the best AUC and specificity of 94.39% and 99.70%, respectively. Finally, a region ensemble model delivered the best accuracy of 85.90%, demonstrating the need for further research to address this challenging problem
    corecore